AFRL-AFOSR-VA-TR-20 16-0267 


IDENTIFYING  DECEPTIVE  SPEECH  ACROSS  CULTURES 


Julia  Hirschberg 

THE  TRUSTEES  OF  COLUMBIA  UNIVERSITY  IN  THE  CITY  OF  NEW  YORK 
116TH  AND  BDWY 
NEW  YORK,  NY  10027 


07/27/2016 
Final  Report 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Air  Force  Research  Laboratory 
AF  Office  Of  Scientific  Research  (AFOSR)/RTA2 


https://livelink.ebs.afrl.af.mil/livelink/llisapi.dll 


7/28/2016 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including 
suggestions  for  reducing  the  burden,  to  the  Department  of  Defense,  Executive  Service  Directorate  (0704-0188).  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no 
person  shall  be  subject  to  any  penalty  for  failing  to  comply  with  a  collection  of  information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ORGANIZATION. 


1.  REPORT  DATE  (DD-MM-YYYY) 

25-06-2016 


4.  TITLE  AND  SUBTITLE 


2.  REPORT  TYPE 


Final  Report 


3.  DATES  COVERED  (From  -  To) 

15-09-2011  to  14-05-2016 


5a.  CONTRACT  NUMBER 


IDENTIFYING  DECEPTIVE  SPEECH  ACROSS  CULTURES 


5b.  GRANT  NUMBER 

FA9550-1 1-1-0120 


5c.  PROGRAM  ELEMENT  NUMBER 


6.  AUTHOR(S) 

Hirschberg,  Julia  Bell 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

TRUSTEES  OF  COLUMBIA  UNIVERSITY  IN  THE 
CITY  OF  NEW  YORK 

SPONSORED  PROJECTS  ADMINISTRATION 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

USAF,  AFRL  DUNS  143574726 
AF  OFFICE  OF  SCIENTIFIC  RESEARCH 
875  NORTH  RANDOLPH  STREET,  RM  3 1 12 
ARLINGTON  VA  22203 


12.  DISTRIBUTION/AVAILABILITY STATEMENT 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

AFOSR/PKR3 


11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


14.  ABSTRACT 

We  have  completed  our  collection  of  deceptive  and  non-deceptive  speech  recorded  from  interviews  between  native  speaker  of  Mandarin  and  of 
English  instructed  to  answer  truthfully  or  to  lie  about  24  biographical  questions.  Subjects  were  rewarded  or  penalized  financially  for  their  ability  to 
lie  (interviewee)  or  to  distinguish  truth  from  lie  (interviewer);  each  subject  acted  both  roles.  At  125h  (174  subjects),  this  is  by  far  the  largest  cleanly 
recorded  speech  corpus  of  its  kind.  From  this  data,  we  find  that  ability  to  lie  is  significantly  correlated  with  ability  to  detect  deception.  We  also  find 
significant  correlations  of  deception  ability  with  personality  factors  (extraversion,  conscientiousness).  Using  acoustic-prosodic  features,  gender, 
ethnicity  and  personality  information  our  machine  learning  experiments  can  classify  truth  vs.  lie  in  our  data  with  65%  accuracy;  we  expect  better 
results  when  we  include  lexical  features.  Surprisingly,  using  only  3 -4m  of  norming  data  collected  from  each  subject  before  the  truth/lie  interviews, 
and  including  lexical  and  acoustic-prosodic  features,  together  with  gender,  ethnicity  and  personality  scores  we  are  able  to  predict  ability 


15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

17.  LIMITATION  OF 
ABSTRACT 


18.  NUMBER  19a.  NAME  OF  RESPONSIBLE  PERSON 

OF  Beni  amin  Knott 

PAGES  - - - 

19b.  TELEPHONE  NUMBER  (Include  area  code) 

703-696-1142 


DISTRIBUTION  A:  Distribution  approved  for  public  releas 


Standard  Form  298  (Rev.  8/98) 

Prescribed  by  ANSI  Std.  Z39.18 
Adobe  Professional  7.0 


INSTRUCTIONS  FOR  COMPLETING  SF  298 


1.  REPORT  DATE.  Full  publication  date,  including 
day,  month,  if  available.  Must  cite  at  least  the  year  and 
be  Year  2000  compliant,  e.g.  30-06-1998;  xx-06-1998; 
xx-xx-1998. 

2.  REPORT  TYPE.  State  the  type  of  report,  such  as 
final,  technical,  interim,  memorandum,  master's  thesis, 
progress,  quarterly,  research,  special,  group  study,  etc. 

3.  DATES  COVERED.  Indicate  the  time  during  which 
the  work  was  performed  and  the  report  was  written, 
e.g.,  Jun  1997  -  Jun  1998;  1-10  Jun  1996;  May  -  Nov 
1998;  Nov  1998. 

4.  TITLE.  Enter  title  and  subtitle  with  volume  number 
and  part  number,  if  applicable.  On  classified 
documents,  enter  the  title  classification  in  parentheses. 

5a.  CONTRACT  NUMBER.  Enter  all  contract  numbers 
as  they  appear  in  the  report,  e.g.  F33615-86-C-5169. 

5b.  GRANT  NUMBER.  Enter  all  grant  numbers  as 
they  appear  in  the  report,  e.g.  AFOSR-82-1234. 

5c.  PROGRAM  ELEMENT  NUMBER.  Enter  all 
program  element  numbers  as  they  appear  in  the  report, 
e.g.  61101A. 

5d.  PROJECT  NUMBER.  Enter  all  project  numbers  as 
they  appear  in  the  report,  e.g.  1F665702D1257;  ILIR. 

5e.  TASK  NUMBER.  Enter  all  task  numbers  as  they 
appear  in  the  report,  e.g.  05;  RF0330201;  T4112. 

5f.  WORK  UNIT  NUMBER.  Enter  all  work  unit 
numbers  as  they  appear  in  the  report,  e.g.  001; 
AFAPL30480105. 

6.  AUTHOR(S).  Enter  name(s)  of  person(s) 
responsible  for  writing  the  report,  performing  the 
research,  or  credited  with  the  content  of  the  report.  The 
form  of  entry  is  the  last  name,  first  name,  middle  initial, 
and  additional  qualifiers  separated  by  commas,  e.g. 
Smith,  Richard,  J,  Jr. 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND 
ADDRESS(ES).  Self-explanatory. 


8.  PERFORMING  ORGANIZATION  REPORT  NUMBER. 

Enter  all  unique  alphanumeric  report  numbers  assigned  by 
the  performing  organization,  e.g.  BRL-1234; 

AFW  L-TR-85-40 1 7-Vol-2 1  -PT-2 . 

9.  SPONSORING/MONITORING  AGENCY  NAME(S) 
AND  ADDRESS(ES).  Enter  the  name  and  address  of  the 
organization(s)  financially  responsible  for  and  monitoring 
the  work. 

10.  SPONSOR/MONITOR'S  ACRONYM(S).  Enter,  if 
available,  e.g.  BRL,  ARDEC,  NADC. 

11.  SPONSOR/MONITOR'S  REPORT  NUMBER(S). 

Enter  report  number  as  assigned  by  the  sponsoring/ 
monitoring  agency,  if  available,  e.g.  BRL-TR-829;  -215. 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT.  Use 

agency-mandated  availability  statements  to  indicate  the 
public  availability  or  distribution  limitations  of  the  report.  If 
additional  limitations/  restrictions  or  special  markings  are 
indicated,  follow  agency  authorization  procedures,  e.g. 
RD/FRD,  PROPIN,  ITAR,  etc.  Include  copyright 
information. 

13.  SUPPLEMENTARY  NOTES.  Enter  information  not 
included  elsewhere  such  as:  prepared  in  cooperation 
with;  translation  of;  report  supersedes;  old  edition  number, 
etc. 

14.  ABSTRACT.  A  brief  (approximately  200  words) 
factual  summary  of  the  most  significant  information. 

15.  SUBJECT  TERMS.  Key  words  or  phrases  identifying 
major  concepts  in  the  report. 

16.  SECURITY  CLASSIFICATION.  Enter  security 
classification  in  accordance  with  security  classification 
regulations,  e.g.  U,  C,  S,  etc.  If  this  form  contains 
classified  information,  stamp  classification  level  on  the  top 
and  bottom  of  this  page. 

17.  LIMITATION  OF  ABSTRACT.  This  block  must  be 
completed  to  assign  a  distribution  limitation  to  the  abstract. 
Enter  UU  (Unclassified  Unlimited)  or  SAR  (Same  as 
Report).  An  entry  in  this  block  is  necessary  if  the  abstract 
is  to  be  limited. 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Standard  Form  298  Back  (Rev.  8/98) 


Final  Report:  “IDENTIFYING  DECEPTIVE  SPEECH  ACROSS  CULTURES” 
PI:  Julia  Hirschberg,  6/25/2016 


We  have  completed  our  collection  of  deceptive  and  non-deceptive  speech  recorded  from 
interviews  between  native  speaker  of  Mandarin  and  of  English  and  are  currently  completing  the 
use  of  this  data  to  produce  classifiers  that  can  automatically  distinguish  truth  from  lie  using 
speech  features,  gender,  ethnicity,  and  personality  inventory  information. 

Experimental  Design :  Subjects  were  brought  into  the  lab,  given  a  demographic  survey  to  assess 
age,  ethnicity,  and  years  of  English  study.  They  were  then  asked  to  answer  24  biographical 
questions  (e.g.  “What  is  your  mother’s  occupation?”,  “Who  was  the  last  person  you  were  in  a 
physical  fight  with?”)  truthfully.  We  then  chose  12  of  these  questions  and  asked  them  to  prepare  a 
false  answer,  which  we  checked  to  make  sure  this  was  different  enough  from  the  truth.  Subjects 
were  then  interviewed  individually  in  a  sound  booth  to  obtain  “norming”  speech  data,  pre¬ 
interview.  We  also  administered  the  NEO-FFI  Five  Factor  Personality  inventory  to  each. 
Subjects  then  entered  the  booth  again,  where  they  took  turns  interviewing  one  another  about  the 
biographical  questionnaire.  They  were  separated  in  the  booth  by  a  curtain.  Interviewers  were 
asked  to  judge  truth  of  lie  for  each  of  the  24  questions,  also  writing  down  their  confidence  in 
their  judgment.  Interviewees  were  asked  to  indicate  for  each  statement  they  made  in  the 
interview  whether  that  statement  contained  any  false  information  or  not  by  pressing  a  key  on  the 
keyboard  in  front  of  them.  Subjects  were  rewarded  or  penalized  financially  for  their  ability  to  lie 
(interviewee)  or  to  distinguish  truth  from  lie  (interviewer).  At  125h  (174  subjects),  this  is  by  far 
the  largest  cleanly  recorded  speech  corpus  of  its  kind. 

Statistical  Correlations  and  Classification  Results :  From  analyzing  the  speech  data  we  have 
collected,  we  find  that  ability  to  lie  is  significantly  correlated  with  ability  to  detect  deception 
(r(280)  =  0.12,  p  =  0.05);  this  holds  across  all  subjects  but  is  strongest  for  females  (r(140)  =  0.24, 
p  =  0.005).  We  also  find  significant  correlations  of  deception  ability  with  personality  factors 
(e.g.  extraversion  is  negatively  correlated  for  English  males,  r(70)  =  -0.24,  p  =  0.04  and  there  is  a 
tendency  for  conscientiousness  also  to  be  negatively  correlated  for  English  females  while 
extraversion  tends  to  be  positively  correlated  for  Mandarin  females). 

Using  acoustic-prosodic  features  (e.g.  pitch,  intensity,  speaking  rate,  voice  quality),  gender, 
ethnicity  and  personality  information,  our  machine  learning  experiments  can  classify  truth  vs.  lie 
in  our  data  with  65%  accuracy;  we  expect  even  better  results  when  we  include  lexical  features. 
Surprisingly,  using  only  3-4m  of  norming  data  collected  from  each  subject  before  the  truth/lie 
interviews,  and  including  lexical  and  acoustic-prosodic  features,  together  with  gender,  ethnicity 
and  personality  scores  we  are  able  to  predict  ability  to  detect  deception  with  65%  accuracy  over  a 
majority  class  baseline  of  59.9%. 

We  have  also  found  significant  differences  in  interviewers'  ability  to  judge  truth  vs.  lie 
depending  upon  whether  the  questions  asked  were  yes/no  (e.g.  “Have  you  ever  been  in  trouble 
with  the  police?”  vs.  open-ended  (e.g.  “What  is  the  last  movie  you  saw  that  you  really  hated?”) 
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or  were  sensitive  (e.g.  “Who  ended  your  last  romantic  relationship?”)  vs.  non-sensitive  (e.g.  “Do 
you  own  an  e-reader  of  any  kind?”),  with  yes/no  questions  and  sensitive  questions  easier  for 
interviewers  to  judge  correctly.  Finally,  we  have  also  found  that  the  3-4m  of  norming  data  we 
collected  even  before  the  interviews  began  can  be  used  to  identify  gender,  ethnicity,  and 
personality  factors  -  as  well  as  ability  to  deceive  —  with  considerable  accuracy.  We  have  also 
found  important  differences  relating  to  gender  and  ethnicity,  of  interviewer  and  interviewee,  with 
respect  to  ability  to  deceive  successfully  and  with  respect  to  the  type  of  questions  interviewers 
find  easier  to  judge  correctly. 
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Abstract 

We  have  completed  our  collection  of  deceptive  and  non-deceptive  speech  recorded  from  interviews 
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personality  factors  with  considerable  accuracy.  Overall,  we  have  also  found  important  differences  relating 
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to  gender  and  ethnicity,  of  interviewer  and  interviewee,  with  respect  to  ability  to  deceive  successfully  and 
with  respect  to  the  type  of  questions  interviewers  find  easier  to  judge  correctly. 
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